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Abstract 

Specific peptide ligand recognition by modular interaction domains is essential for the fidelity of information flow 
through the signal transduction networks that control cell behavior in response to extrinsic and intrinsic stimuli. Src 
homology 2 (SH2) domains recognize distinct phosphotyrosine peptide motifs, but the specific sites that are 
phosphorylated and the complement of available SH2 domains varies considerably in individual cell types. Such 
differences are the basis for a wide range of available protein interaction microstates from which signaling can 
evolve in highly divergent ways. This underlying complexity suggests the need to broadly map the signaling 
potential of systems as a prerequisite for understanding signaling in specific cell types as well as various 
pathologies that involve signal transduction such as cancer, developmental defects and metabolic disorders. This 
report describes interactions between SH2 domains and potential binding partners that comprise initial signaling 
downstream of activated fibroblast growth factor (FGF), insulin (Ins), and insulin-like growth factor- 1 (IGF-1) 
receptors. A panel of 50 SH2 domains screened against a set of 192 phosphotyrosine peptides defines an extensive 
potential interactome while demonstrating the selectivity of individual SH2 domains. The interactions described 
confirm virtually all previously reported associations while describing a large set of potential novel interactions that 
imply additional complexity in the signaling networks initiated from activated receptors. This study of pTyr ligand 
binding by SH2 domains provides valuable insight into the selectivity that underpins complex signaling networks 
that are assembled using modular protein interaction domains. 



Lay abstract 

Every cell in our body is an immensely powerful computa- 
tional device capable of integrating vast amounts of data 
from intrinsic and extrinsic cues and responding with re- 
markable fidelity. What underlines this computational 
power are not static wires, but dynamic interactions that 
leverage the finite number of genes to generate an almost 
infinite number of combinatorial interactions between 
protein components. In the post-genomics era, mapping 
these interactions represents a next frontier. The sum 

* Correspondence: pdnash.uchicago@gmail.com 

'Ben May Department for Cancer Research, The University of Chicago, 

Chicago, IL 60637, USA 

2 Committee on Cancer Biology, The University of Chicago, Chicago, IL 60637, 
USA 

Full list of author information is available at the end of the article 



total of all permitted interactions is referred to as the 
potential interactome. In any given cell, only a subset of 
potential interactions will be enabled and this defines the 
selective differences in signalling between tissues. Under- 
standing the whole provides insight into the information 
processing power of the system and may suggest new ave- 
nues for therapeutic intervention to treat diseases caused 
by faults in signal processing mechanisms. This study out- 
lines the potential interactome for initial signalling events 
from the insulin receptor, insulin-like growth factor 
receptor and all four members of the fibroblast growth 
factor receptor family. These systems are essential for 
human development and dysfunctional signalling has 
been implicated in a wide range of human diseases 
including diabetes, many cancers, Alzheimer's disease, 
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many developmental disorders and even aging. Binary 
connections are reported between 50 SH2 domain- 
containing proteins and 192 phosphopeptide nodes on 
13 signal-initiating proteins. This verified almost every 
interaction described in the past 25 years and adds an 
extensive new data, providing a step towards fathom- 
ing the intricacies of differential cell communication 
between various tissues and disease states. 

Introduction 

Signaling immediately downstream of receptor tyrosine 
kinases (RTKs) is accomplished in large part by the re- 
cruitment of phosphotyrosine (pTyr) interacting proteins 
to sites of tyrosine phosphorylation on the activated 
receptors and their associated scaffold proteins [1-3]. A 
given RTK may contain on the order of 10-20 phos- 
phorylatable tyrosine residues with additional sites avail- 
able on associated scaffold proteins resulting in a large 
number of potential sites for recruiting binding partners. 
The majority of phosphotyrosine interacting proteins 
contain a conserved Src homology 2 (SH2) domain [4]. 
The SH2 domain is the classic archetype for the large 
family of modular protein interaction domains that serve 
to organize a diverse array of cellular processes [5,6]. 
SH2 domains interact with phosphorylated tyrosine- 
containing peptide sequences [7-11] and in doing so 
they couple activated protein tyrosine kinases (PTKs) to 
intracellular pathways that regulate many aspects of cel- 
lular communication in metazoans [12,13]. The human 
genome encodes 111 SH2 domain proteins [14,15] that 
represent the primary mechanism for cellular signal 
transduction immediately downstream of PTKs. As one 
might expect, SH2 domain proteins play an essential role 
in development and have been linked to a wide array of 
human malignancies including cancers, diabetes, and 
immunedeficiencies [14,16]. 

Despite the importance of SH2-mediated signaling in 
human disease, our understanding of their interactions 
remains far from complete. Direct experimental measure- 
ment of binding partners has typically focused on specific 
interactions driven by hypotheses relating to the precise 
signaling events under investigation. This yields a set of 
high quality, but inevitably sparse data. Certain pTyr pro- 
teins and SH2 domains are extensively studied while others 
are more arcane. Nonetheless, the SH2-mediated interac- 
tions reported over 25 years of intensive study provide a 
solid foundation for validating high-throughput datasets. 

SH2 domain interactions are almost always phosphor- 
ylation dependent as roughly half of the binding energy 
is devoted to pTyr recognition [17,18]. Despite this, SH2 
domains preserve substantial specificity for peptide 
ligands, recognizing residues adjacent to the pTyr, par- 
ticularly those at positions +1 to +5 C-terminal to the 
critical pTyr [19-21]. This is achieved in part by use of 



complex recognition events that effectively combine the 
use of motifs and sub-motif modifiers [11]. Specifically, 
SH2 domains recognize targets not only through permis- 
sive residues adjacent to the phosphotyrosine that con- 
stitute binding motifs, but also by making use of 
contextual sequence information and non-permissive 
residues [22] to define highly selective interactions with 
physiological peptide ligands. The specificity of SH2 
domains enables their use as tools to profile the global 
phosphotyrosine state of cells or tissues [23-27], without 
a priori knowledge of the specific target proteins or pep- 
tides. Profiling signaling using SH2 domains has direct 
implications to diagnosis and guiding therapeutic deci- 
sions as the patterns obtained can be used to classify 
tumors [27]. The ligand specificity of many SH2 
domains has been evaluated using approaches including 
synthetic peptide libraries [19,28,29], oriented peptide li- 
braries [20,30] and phage display [31]. Information of 
this type is often described by position-specific scoring 
matrices (PSSM), and allows programs such as ScanSite 
and Scoring Matrix-Assisted Ligand Identification 
(SMALI) to predict potential binding motifs [20,21]. 

Recruitment of SH2 domain proteins to phosphorylated 
sites is a dynamic process and is by no means predeter- 
mined by the phosphorylation event alone. Each tyrosine 
site on a scaffold (including sites on receptors that recruit 
SH2 domains) can be phosphorylated or unphosphory- 
lated. The phosphorylated site can either be free or occu- 
pied by one of its potential binding partners. Each possible 
assembly of interaction partners on a given scaffold repre- 
sents an interaction microstate [32-35]. The actual popu- 
lated interaction microstates from which signaling 
develops is a function of many factors, including protein 
expression levels, local concentration, and the probability 
that a given site is phosphorylated. Thus, distinct signaling 
networks may originate from the same scaffold or recep- 
tor in different cell types. This is also true under condi- 
tions of aberrant expression of signaling components that 
are a common occurrence in pathologies such as cancer. 
Thus, accurate and well-annotated potential interactomes 
that represent the aggregate available interaction micro- 
states are a valuable resource that opens the door to inter- 
preting studies of signaling in different cell types or under 
conditions of altered protein expression. As the Human 
Protein Atlas detailing subcellular localization data and 
expression data makes clear, cell lines and tissues vary 
widely and often in unanticipated ways in terms of protein 
expression [36]. All of this suggests that detailed potential 
interactomes may provide substantial benefit in under- 
standing cell-type specific signaling. 

Herein, we describe a potential interactome obtained 
using addressable peptide arrays consisting of 192 
physiological peptides from the insulin (Ins), insulin 
growth factor 1 (IGF-1) and fibroblast growth factor 
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(FGF) signaling pathways to identify interactions with 
50 SH2 domains. This set represents a broad sam- 
pling of the SH2 domains extant in the human gen- 
ome. The results of this study map a range of potential 
phosphotyrosine-dependent interactions within the FGF 
and Ins/IGF-1 pathways. These signaling systems have 
relevance to understanding complex multi-tissue patholo- 
gies such as diabetes and cancer as well as in normal 
physiology and development. This study confirms 44 of 54 
previously described interactions. In addition, we report 
an extensive set of novel interactions. Validation of 60 bin- 
ary interaction pairs was conducted using the orthogonal 
method of solution binding measured by fluorescence 
polarization. The binding motifs obtained for each SH2 
domain closely match those reported in a number of inde- 
pendent studies. Protein co-precipitation experiments, or 
endogenous phosphorylation upon receptor stimulation, 
were further used to validate a number of interactions. 
The results of this study highlight the available pool of po- 
tential SH2-mediated interactions with these 13 major 
signaling proteins and serve as a first step in under- 
standing signaling microstate variations. Interactive 
figures and additional information may be found at 
http:/ /www.sh2domain.org. 

Results 

Peptide arrays for SH2 interactions within the FGF/lns/ 
IGF-1 signaling pathways 

The use of addressable peptide arrays is a reproducible 
and semi-quantitative approach that has been exten- 
sively validated for studying protein interactions with 
peptide ligands [37-39]. To investigate connections be- 
tween SH2 domain proteins and their putative phos- 
phorylated docking sites on cell surface receptors, we 
developed addressable arrays consisting of 192 phospho- 
tyrosine peptides. This peptide set was assembled using 
71 phospho tyrosine peptide motifs corresponding to all 
of the cytoplasmic tyrosine residues within the FGF 
receptors (FGFR1-4), insulin receptor (InsR) and IGF-1 
receptor (IGF-1R) (Figure 1A). Activation of these recep- 
tors results in the phosphorylation of associated scaffold 
proteins, and so 75 phosphotyrosine peptides corre- 
sponding to a comprehensive list of tyrosine residues 
within insulin receptor substrates (IRS-1 and IRS-2) and 
fibroblast receptor substrates (FRS-2 and FRS-3) were 
included. In addition, 33 phosphotyrosine peptides were 
incorporated from the downstream signaling proteins 
PLC-yl, pl30Cas (BCAR1) and p62DOKl. Finally, a set 
of 12 positive control peptides corresponding to 19 
reported interactions with 15 SH2 domains for which 
equilibrium dissociation constant (K D ) values span a 
range from low nM to 50 uM were incorporated to aid 
in validating the results. These control peptides provide 
a reference and establish the empirical cut-off for 



designated binding interactions (Table 1). No discrimin- 
ation was made against peptides on the basis of reported 
phosphorylation state in order to examine a diverse and 
unbiased set of motifs. The resulting set of 192 phospho- 
tyrosine peptides and their corresponding position in the 
proteins of origin is noted in Additional file 1: Table SI. 
Addressable arrays were synthesized as membrane- 
bound 11-mer peptides using the SPOT synthesis tech- 
nique [40-42]. While the majority of SH2 domains 
recognize residues C-terminal to the phosphotyrosine in 
their cognate peptide ligands, additional contacts be- 
tween SH2 domains and residues N-terminal to the 
phosphotyrosine are observed for the SH2 domain of 
Sh2dla (SAP) [43] and cannot be ruled out in other 
cases. Peptides were synthesized with six flanking resi- 
dues C-terminal to the phosphotyrosine and four resi- 
dues N-terminal to the phosphotyrosine. 

To assess the potential network of SH2 domain inter- 
actions we selected 50 SH2 domains representing 28 of 
the 38 families of SH2 domains (Figure IB) all of which 
we have previously shown can be expressed and purified 
[23]. These include a number of extensively studied SH2 
domains (Src, Grb2, PLCy), as well as a number of less 
studied SH2 domains from proteins such as Shd, She, 
Shf, Sink (Sh2d6), Sh2dla (SAP), Sh2dlb (Eat-2), and 
Brdgl. To address potential variability in specificity 
within families we employed all members from the SHB, 
CRK, GRB2, SRC and ABL families (families are indi- 
cated with complete Capitalized lettering). 

SH2 domains were arrayed as GST fusion proteins and 
detected using anti-GST primary antibodies and near- 
infrared labeled secondary antibodies. In an effort to 
present a dataset with minimal false positives, we chose 
an empirical cutoff based on the array average across all 
peptide spots to classify interactions (Figure 1A). In 
cases where the intensity of the signal for an individual 
SH2-domain binding event exceeded the mean intensity 
of all the peptides on the membrane by three-fold were 
scored as "array positives" [22]. Non-binding was judged 
in cases where the intensity of a spot was less than the 
mean intensity of all spots on the membrane and these 
were scored as "array negatives". Peptides with signal in- 
tensities between IX and 3X mean were scored as 
"indeterminate" and ascribed as neither array positive 
binding interactions nor array-negative non-binders. 
Analysis of the distribution of SH2 domain interactions 
per phosphopeptide revealed that our dataset possessed 
a bimodal distribution, with a significant number of pep- 
tides binding to many SH2 domains (Additional file 2: 
Figure S2). This signature may be indicative of promis- 
cuity differences between phosphopeptides or there may 
be a subset of peptides which interact in a nonspecific 
fashion with either the GST fusion tag or one of the 
antibodies used for detection, resulting in false positives. 
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Figure 1 Probing interactions between SH2 domains and physiological peptide ligands at a systems level. (A) A representation of a SPOT 
peptide array containing 192 phosphotyrosine peptides including control peptides (black) and peptides from the 13 proteins present on the array 
indicated by their represented colors. SPOT peptide arrays were incubated with 250nM GST-SH2 domain as indicated. Interactions were detected 
using anti-GST antisera and Alexa-680-labeled anti-mouse secondary antibody and the intensity of signals recorded using LiCor Odyssey. (B) 
Neighbor-Joining Tree of all 121 SH2 domains. Highlighted in blue are the SO SH2 domains selected across different families for this study. (C) 
Peptide arrays using SPOTS is a semi-quantitative method for measuring protein domain-pTyr peptide interactions. The dissociation constants (K D ) 
were measured between 60 interaction pairs presenting interactions determined using peptide arrays as greater than 3X the mean, between 1 
and 3X the mean and less than 3X the mean. The mean K D value for each group is marked with a black line. 



Consistent with our goal of reducing the errors asso- 
ciated with identifying false-positives, we probed three 
separate arrays with three separate preps of the GST fu- 
sion tag alone. Potentially non-specificly interacting pep- 
tides (so-called 'sticky' peptides) were identified as any 
that bound to GST with above mean intensity in two 
out of three separate trials. This approach identifies any 
peptides which interact with GST or either of the recog- 
nition antibodies, a known confounding factor for down- 
stream analysis [44]. This conservative approach allows 
us to score many significant peptides as 'binders' which 
may have been indeterminate before when incorporating 



the 'sticky' peptides into the array average. This resulting 
in discarding 40 peptides representing 382 potential 
interaction pairs as non-selective and resulted in a data- 
set of substantially higher quality. 

Validation by orthogonal assays and literature-verified 
interactions 

To verify the binding results obtained from addressable 
peptide arrays we employed an orthogonal method of 
determining SH2 interactions with peptide ligands. We 
measured the dissociation constants of 60 binary SH2- 
peptide pairs in solution by fluorescence polarization 
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Table 1 Literature confirmed interactions 39 array-positive interactions were experimentally verified or confirm 
previously reported interactions while 23 array-negative interactions empirically suggest a threshold corresponding to 
a K D of approximately 5 to 10 uM for this data set 



Peptide 
sequence 


Protein/position 


Expected 
partner 


Array 
positive 


Affinity (K D )/ 
relative 
amnity (i<- 50 ) 


References 


Comments and other SH2 domains bound 


ATDDpYAVPPPR 


p62DOK1 (Y409), 
simitar to 
p130Cas (Y410) 


Crk 


Yes 


N.A. 


[87] 


Crkl, Hck 


CULJKjyj 1 UVr l\r r 


Thl N11A\ 

v-UI \ \ / / 


Crk 


Yes 


N A 


rqqi 
I. '-"-'J 


Crkl 






Hck 


Yes 


N.A. 


[89,90] 


Pgr, Src, Yes 


AEDVpYDVPPPA 


p130Cas (Y362) 


Crk 


Yes 


K D = 0.545 uM 


[87] 


Src, Hck, Fyn 


GLDEpYDEVPMP 


B3AT (Y921)* 


Nck1 


Yes 


K D = 0.06 \M 


[91] 


•Similar to the TIR10 peptide (EHIpYDEVAAD). Fer 


DDPSpYVNVQNL 


ShcA (Y426) 


Grb2 


Yes 


K D = 23nM 


[92,93] 


Gads, Grap 






Grb2 


Yes 


K D = 53±8nM 


[94] 




ADNDpYIIPLPD 


PDGFRb (Y1021) 


Plcy-N 


Yes 


K D = 0.65 - 
2.2nM 1 


[95] 


tandem SH2 domains of PLCG1 were tested against a 
tandem phospho-peptide of PDGFRp 1 Y1009/Y1021. 






Plcy-C 


Yes 


K D = 0.65 - 
2.2nM 1 


[95-97]; 


Brk 






Plcy-C 


Yes 


K D = 4.1 ±0.8 
uM 


[98] 








Vav1 


Yes 


N.A. 


[99] 








PI3K_N 


No 


ID 50 = 45 ± 14 
uM 


[79] 


Below Threshold 


SLTIpYAQVQKA 


SLAM (Y280) 


Sh2d1b 


Yes 


K D = 131nM 


[100] 




HDGLpYQGLSTA 


CD 150 (Y142) 


Shd 


No 


K D = 50 uM 


[101] 


Below Threshold 


b 1 VtpYb 1 VVHb 


gp I ju/il-o 
Receptor (Y759) 


Ptpn 1 1 


No 


M A 

In. A. 


[102] 


r I rN I I _N IS 1 .33A Mean, K I r IN I I _L IS 2.oa Mean Brag 
Brk, Sh2d1b 


AFPOnVFFIPIV 
Acr L/p I LCIr 1 T 


Middle T-sntigen 
(Y323) 




I CS 


IXq — U.jj U.O 

uM 


r 1 oi 


RIL- Rrle Ft>r Fnr V\rV I \/n MrH ^hr1 Vpc 
DIK, Dlr\, rfcri, rLJI, nLK, Lyil, INLKI, 3NLI, \ cb 






rV 
LLK 


I CS 


IXQ — I . D ZL U.Z 

pM 








nr r\ I ^ i 3/ y) 


NLK 1 


I CS 


M A 


n OA] 

LIU4J 


Fe>r ^l MR 1 fdmibr tr\ Rl MrC anrl ^1 P7f^ 


TRDIpYETDpYpYR 


InsR 

(Y1 ioc on nr\\ 

\ T i i o j,oy,yuy 


PIcy_C 


Yes 


N.A. 


[105] 


Crk, Crkl, Fer, Grb7, PI3K1_C 


EDLSpYGDVPPG 


IRS-1 (Y1S1) 


Nck1 


No 


N.A. 


[106] 


1.76X Mean Abl 1 , Blk, Fyn, Lck, Lyn, Sh2d1b, Shcl, Ship2, 
Sink, Yes 


ELSNpYICMGGK 


IRS-1 (Y46S) 


Ptpnl 1_N 


No 


IC 50 = 48±16 
(jjvi 


[107] 


Below Threshold, 0.9X Mean 


SIEEpYTEMMPA 


IRS-1 (Y5S1) 


Ptpnl 1_N 


No 


IC 50 = 11 ±1.0 
uM 


[107] 


Below Threshold, 0.56X Mean 


GSGDpYMPMSPK 2 


IRS-1 (Y612) 


PI3K1_N 


No 


ID 50 = 0.7-1.1 
uM 


[79] 


2 Peptide Y632 (GSGDpYMPMSPK) on IRS-1 is similar to 
Y612 (TDDGpYMPMSPG) on IRS-1. 


DPNGpYMMMSPS 


IRS-1 (Y662) 


Ptpnl 1_N 


No 


IC 50 = 96 ± 1 3 
uM 


[107] 


Below Threshold, 0.54X Mean 


SPGEpYVNIEFG 


IRS-1 (Y896) 


Ptpnl 1_N 


Yes 


IC 50 = 4.8+1 .0 
uM 


[107,108] 


Abl2, Blk, Dappl, Grb7, Itk, Mist, PI3K1_N, PI3K1_C, 
PTPN11 N, PLCy C, Rasal N, Rasal C, Sh2b, Sh2d1 b, 
Shb, Shf, Shd, She, Syk_C, Vavl, Yes 






Grb2 


Yes 


K D = 35nM 


[92,108,109] 


Gads, Grap 


APVSpYADMRTG 


IRS-1 (Y1012) 


Ptpnl 1_N 


No 


K D = 1 1 0 ± 23 
uM 


[107] 


Below Threshold, 0.59X Mean 


NGLNpYIDLDLV 


IRS-1 (Y1179) 


Ptpnl 1_N 


Yes 3 


K D = 3.0 ± 
0.60nM 


[95,108,110] 


3 Tandem SH2 domains of PTPN1 1 was used to bind to 
the tandem motif of IRS-1 
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Table 1 Literature confirmed interactions 39 array-positive interactions were experimentally verified or confirm 
previously reported interactions while 23 array-negative interactions empirically suggest a threshold corresponding to 
a K D of approximately 5 to 10 uM for this data set (Continued) 







Ptpnl 1_N 


Yes 


IC S0 = 1.1 ±0.5 
uM 


[107] 


Abll, Ptpnl 1_N, Plcy_C, Rasa1_N, Shb, Shf, Shd, She, Yes 






Fyn 


No 


N.A. 


[46] 


0.45X Mean 


DLSApYASISFQ 


mr 1 /\/i TifiN 

1Kb- 1 (Y1229) 


Ptpnl l_N 


No 


IC 50 = 25 ± 4.2 
pM 


[107] 


Below Threshold, 2.78X Mean 






Ptpnl 1_C 


No 


N.A. 


[108] 


1.59X Mean 






Fyn 


No 


N.A. 


[46] 


0.48X Mean 


GGEFpYGYMTMD 


IRS-2 (Y540) 


Plcy_C 


Yes 


N.A. 


[111] 


Dappl, Grb7 


PNGDpYLNVSPS 


IRS-2 (Y766) 


Grb2 


Yes 


N.A. 


[112] 


Sh2d1b, Vav1 


SNQEpYLDLSMP 


FGFR1 (Y766) 


Shb 


No 


N.A. 


[113] 


Similar peptide to Y760 of FGFR3 but has weak binding, 
0.45X Mean 






Plcy 


No 


N.A. 


[55] 


PLCy_N - 0.52X Mean 
PLCy_C- 0.21 X Mean 


THDLYMIMREA 


FGFR3 (Y724) 


Sh2b 


No 


N.A. 


[114] 


0.44X Mean 


STDEpYLDLSAP 


FGFR3 (Y760) 


Sh2b 


No 


N.A. 


[114] 


0.46X Mean 


VSEEYLDLRLT 


FGFR4 (754) 


Plcy 


No 


N.A. 


[115] 


SHD 


QVHTpYVNTTGV 


FRS2 (Y196) 


Grb2 


Yes 


N.A. 


[116] 


Abl2, Gads, Grap, PI3K1_C 


NKLVpYENINGL 


rr>( — i l\/~ir\/'\ 

FRS2 (Y306) 


Grb2 


Yes 


N.A. 


[116] 


Grap, Grb7, Sh2d2a 


ALLNpYENLPSL 


FRS2 (Y349) 


Grb2 


Yes 


N.A. 


[116] 


Abll, Abl2, Gads, Grap, Grb2, Grb7, PI3K1_C, Sh2d2a, 
Sh3bp2 


PMHNpYVNTENV 


FRS2 (Y392) 


Grb2 


Yes 


N.A. 


[116] 


Gads, Grap 


RQLNpYIQVDLE 


FRS2 (Y436) 


Ptpnl 1_N 


Yes 


N.A. 


[117] 


Itk, Mist, PLCy_C, Rasa1_N, Shb, Shd, She, Syk_C 


NPGFpYVEANPM 


PLCyl (Y783) 


Plcy_C 


No 


N.A. 


[118] 


PLCy_C at 1 .03X mean; Zap70_N 


EQDEYDIPRHL 


p130Cas (Y234) 


Crk 


Yes 


N.A. 


[87] 


Brk, Crkl, Fyn, Lck, Lyn, Shcl, Yes 


PQDIYDVPPVR 


P130Cas (Y249) 


Crk 


Yes 


N.A. 


[87] 


Crkl, Zap70_N 


WMEDpYDYVHLQ 


p130Cas (Y664) 


Nck1 


Yes 


N.A. 


[87,119] 


Bcar3, Brk, Crk, Crkl, Dappl, Fer, Grb7, Matk, PI3K1_N, 
Rasa1_N, Rasa1_C, Sh3bp2 






Bmx 


Yes 


N.A. 


[120] 


Itk 






Src 


No 


K D = 25-46nM 


[119,121] 


2.45X mean 






Lck 


No 


N.A. 


[121] 


2.03X mean 


PPALpYAEPLDS 


p62DOK1 (Y296) 


Rasa1_N 


Yes 


N.A. 


[122] 


Abl1,Nck1,Vav1,Zap70_N 






Rasa1_C 


Yes 


N.A. 


[122] 




QDSLpYSDPLDS 


p62DOK1 (Y315) 


Rasa1_C 


Yes 


N.A. 


[122] 


Abll, Blk, Crk, Lck, Lyn, Nckl, Src, Zap70_N 


EDPIpYDEPEGL 


p62DOK1 (Y362) 


Nck1 


Yes 


N.A. 


[123] 


Blk, Hck, Lck, Lyn, Shcl, Src 






Abl1 


Yes 


N.A. 


[124] 




KEEGpYELPYNP 


p62DOK1 (Y398) 


Rasa1_C 


Yes 


N.A. 


[123] 


Abll, Blk, Brk, Fgr, Fyn, Hck, Lck, Lyn, Nckl, Sh2d1 b, Shd, 
Ship2, Src, Vav1, Yes 






Rasa1_N 


Yes 


N.A. 


[123] 





A set of control peptides with previously reported SH2 domain targets was included on each array. We further identified a set of literature-reported interactions 
between specific peptides present on the arrays and SH2 domains used in this study. Peptide sequence is indicated along with the source protein and 
corresponding position of the relevant phosphotyrosine residue. SH2 domains that were expected to bind to each peptide are noted along with the observed 
array-positive status. Measured equilibrium dissociation values (K D ) or relative affinity (IC 50 ) values reported in the literature are noted. Additional array-positive 
SH2 domains identified as interacting with each peptide are also indicated along with explanatory comments 
N.A. - No Affinity Determined, # - IC50 Relative Affinities. 
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Table 2 Measured affinity values 



Peptide 
sequence 


Protein/ 
position 


Expected 
partner 


Array 
positive 


Affinity (K D )/ 
relative affinity 
(IC 50 ) 


References 


Comments and other SH2 domains bound 


AEDVpYDVPPPA 


p130Cas 

rV3£^ 


Abl2 


No 


K D = 14 uM 


This Study 








Crk 


Yes 


K D = 0.35 uM 


This Study; 
[87] 


Fyn, Hck, Src 






CrkL 


Yes 


K D = 0.99 uM 


This Study 








Nckl 


Yes 


K D = 0.93 uM 


This Study 








Ptpnl 1_N 


No 


K D > 50 uM 


This Study 








Shd 


No 


K D = 32.8 uM 


This Study 








Ship2 


No 


K D = 16.5 uM 


This Study 


Below Threshold (1.91X Mean) 


SPGEpYVNIEFG 


IRS-1 

(Y896) 


Abl1 


No 


K D = 7.48 uM 


This Study 


BlkDappI Fgr Grb7 Itk Mist Pi3k1 N Pi3k1 C Plcgl C 
Ptpnl 1_N Rasa1_N Rasa1_C Sh2b Sh2d2a Shb Shf Shd She 
jyK v.. vdv i r es 






Abl2 


No 


K D = 3.66 uM 


This Study 








Crk 


No 


K D > 20 uM 


This Study 








Grb2 


Yes 


K D = 0.8 uM 

«■ — 3^nK/l 
I\q — JDIIIVI 


This Study 

rcn i r,o i nQl 
Lyz, i uo, i uyj 


Gads Grap 






PI3K1 M 
r 1 j r\ 1 IN 


I CS 


[\Q — Z.Do |JIVI 


This Study 








Plcg2_C 


Yes 


K D = 2.92 uM 


This Study 








Ptpnl 1_N 


Yes 


K D = 2.08 [iM 
IC 50 = 4.8± 1.0 

i iM 

|Ji V i 


This Study 
[107,108] 








JNZU 




\\q — J.JO |JIV1 


ins jiuuy 








Src 


No 


K D = 2.21 uM 


This Study 








Tend 


No 


K D = 24 [iM 


This Study 




NGLNpYIDLDLV 


IRS-1 

(Y1 1 79) 


Abll 


No 


K D = 22.2 uM 


This Study 


Itk Plcg1_C Rasa1_N Sh2b Shb Shf Shd She Yes 






CrkL 


No 


K D > 50 [iM 


This Study 








Grb2 


No 


K D > 50 [iM 


This Study 








PI3K1_N 


No 


K D = 1 5 [iM 


This Study 








Ptpnl 1_N 


Yes 3 


K D = 3.0±0.60nM 


[95,108,110] 


3 Tandem SH2 domains of Ptpnl 1 was used to bind to the 
tandem motif of IRS-1 






Ptpnl 1_N 


Yes 


IC 50 =1.1 ±0.5 

i iM 

_n V 1 


[107] 








OIL 


\\U 


t\Q — y.y4 |jivi 


This Study 






Fr,FR1 
mrn I 

(Y463) 






\\q — U.jO |JIV1 


This Study 








Crk 


No 


K D = 44 [iM 


This Study 








CrkL 


No 


K D > 50 [iM 


This Study 








Itk 


Yes 


K D = 2.74 uM 


This Study 








Nck1 


Yes 


K D = 2.45 uM 


This Study 








Rasa1_N 


No 


K D = 1 .54 uM 


This Study 








Rasa1_C 


No 


K D > 1 7 [iM 


This Study 








Src 


No 


K D = 6.85 uM 


This Study 








Vav1 


Yes 


K D = 1.87 uM 


This Study 




STDEpYLDLSAP 


FGFR3 


Abl2 


No 


K D = 27 [iM 


This Study 





(Y760) 
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Table 2 Measured affinity values (Continued) 



(Y398) 



PI3K1_C 


No 


K D = 2.14 [iM 


This Study 


Plcg2_N 


No 


K D > 50 uM 


This Study 


Plcg2_C 


No 


K D = 7.49 uM 


This Study 


Ptpn11_N 


No 


K D =25 |jM 


This Study 


Sh2d1b 


No 


K D = 5.3 uM 


This Study 


Shb 


No 


K D > 50 \M 


This Study 


Rasa1_C 


No 


K D > 17 nM 


This Study 


Abll 


Yes 


K D = 5.77 uM 


This Study 


Abl2 


Yes 


K — 3 47 i iM 


1 1 its jLuuy 


D 1 K 


Vac 
T cb 


i/ — n /ii i iN/i 

t\p — Ulvl 


This Study 




NO 


!<■ — 93 7 i iM 


This Study 


For 


Mrs 


l\D — I D.D UIVl 


This Study 


hgr 


Vac 
I cb 


t\Q — UIVl 


This Study 




Vac 
I cb 


k" — ^ 43 i iM 


This Study 


MrH 


Vac 

I Cb 


k - — 4 37 i \KA 


This Study 


rijixi _in 


Vac 
T cb 


t\|3 — 1 ,oj |JIV1 


This Study 


pntei r 

r 1 j t\ 1 


Mn 

1 \0 


1/ — 1 3 i iM 


This Study 


Ptnn 1 1 M 
r L[Ji l I I IN 


No 


k s^n iim 


Th i c si - 1 i H\ / 

1 1 lib jiuuy 


Rasa1_N 


Yes 


K D = 0.41 uM 


This Study 






N.A. 


[123] 


Rasa1_C 


Yes 


K D = 1.39 uM 


This Study 






N.A. 


[123] 


Sh3bp2 


No 


K D = 1.68 uM 


This Study 


Shd 


Yes 


K D = 5.74 uM 


This Study 


Src 


Yes 


K D = 5.06 uM 


This Study 


Vav1 


Yes 


K D = 4.73 uM 


This Study 



Ship2 Src Vav1 Yes 



Select SH2-peptide interaction pairs were confirmed by fluorescence polarization solution-binding. Additional affinity values from published sources have been 
included and listed accordingly. 



(Table 2, Additional file 2: Figures S3A-C). In all cases 
array-positive interactions were of high affinity (range 
0.18 \im - 5.8 uM, median K D = 2 \iM), while array 
negative interactions were demonstrably lower affinity 
(median K D > 30 \iM) (Figure 1C). This suggests a low 
false-positive rate and indicates that array-positive inter- 
actions correspond to high affinity binding events at a 
high frequency. 

Probing of arrays individually with each of 50 SH2 
domains provides a snapshot of SH2 specificity (Figure 2A). 
As we have previously shown, this method is highly repro- 
ducible [22]. Independent peptide arrays and protein pre- 
parations reveal high reproducibility for the select SH2 
domains (Shb, Ship2, Sh3bp2) (Figure 2B). To confirm 
interactions between full-length proteins we performed a 
set of GST-SH2 pull-down experiments of CHO stably 
expressing InsR and IRS-1 with or without stimulation 
with insulin (Additional file 2: Figure S4). These lysates 
were incubated with GST-SH2 domains and precipitated 



using glutathione-agarose beads to identify SH2 domains 
that were capable of precipitating phospho-IRSl or phos- 
pho-InsR. This confirmed previously described inter- 
actions such as those involving the PI3K C, Shp2_N 
and Fyn (as well as related Src and Itk) SH2 domains 
[45-47]. In addition, interactions observed on the pep- 
tide arrays were confirmed for Rasal, Vavl, and Abl2 
and PLC-yl. 

The literature is a rich source of detailed interactions 
that provide potential validation. Since the discovery of 
the SH2 domain in 1986 [48], detailed study has uncov- 
ered a large set of SH2 interactions. Any high- 
throughput technique would expect to capture most of 
these interactions, and failure to do so may be taken as 
evidence of false-negative results. Each of our address- 
able peptide arrays included a set of 12 designed control 
peptides for which 22 reported interactions covered a 
range of K D values. In addition, we noted 43 interactions 
with the 13 signaling proteins represented on the arrays 
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SH2D2A 



ZAP70 N 



MATK 



SYK C 




5 10 15 20 25 
SHB (A) [Licor Intensity] 



30 



50 100 150 200 250 300 
SH3BP2 (A) [Licor Intensity] 



50 100 150 200 250 
SHIP2 (A) [Licor Intensity] 

Figure 2 Addressable peptide arrays reveal SH2 domain selectivity. (A) 50 SPOT arrays panned against 50 GST-SH2 domains reveals the 
highly selective nature of SH2 domain phosphopeptide interactions. Interactions were detected using anti-GST antisera and Alexa Fluor-680- 
labeled anti-goat secondary antibody and the intensity of signals recorded using LiCor Odyssey. (B) Two separate peptide arrays were probed 
with independent SH2 domain preparations for three SH2 domains (SHB, SHIP2, SH3BP2). The scatter plot reveal some variability between the 
independent SPOT experiments yet revealing a strong correlation coefficient (R 2 ). 



350 
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reported in UniHI [49] from the interaction databases of 
MINT [50], BIND [51], HPRD [52], and DIP [53]. Of the 
22 designated control interactions, 18 were noted as 
array-positive (Table 1). Of the remaining four expected 
interactions, three have measured affinities, and in all 
cases the equilibrium dissociation constant is weaker 
than 16 uM. All of the array-positive interactions for 
which affinity is reported have K D values stronger than 
4.1 uM. Thus, this control set suggests an approximate 
threshold of binding in the range of 10 uM ± 5 uM. Of 
the 43 database-reported interactions, most were array 
positive and of those that were not array-positive, a 
number were just sub-threshold and judged to be inde- 
terminate (Table 1). The ability to recapitulate the vast 
majority of known (literature-reported) interactions and 
to verify novel interactions by orthologous methods is 
indicative of a high quality dataset [54]. 

Reconciling conflicts with other datasets 

As noted above, this study performs well in terms of re- 
producing the literature reported interactions between 
the 50 SH2 domains tested and the 13 proteins repre- 
sented on the addressable arrays (Table 1). A handful of 
differences with literature-reported interactions must, by 
necessity, be reconciled. Our assumption is that a high- 
throughput (HTP) study such as this one should capture 
upwards of 85% of known (literature reported) interac- 
tions and that results that differ from low throughput 
studies described in the literature should be subject to 
further testing to identify the nature of the discrepancy 
and reveal any weakness in the HTP dataset [55]. We 
examined a set of potential discrepancies and found that 
in each case our dataset held up well. For instance, 
FGFR1 Y-766 (SNQEpYLDLSMP) is reported to bind to 
PLCyl in a pTyr dependant manner based on muta- 
tional analysis of FGFR1 [55,56]. We tested the PLCy2 
SH2 domain with an analogous peptide from FGFR3 
Y-760 (STDEpYLDLSAP) and failed to detect any inter- 
action. Direct measurement of peptide binding to either 
the PLCy2_N or PLCy2_C SH2 domain by fluorescence 
polarization in solution also failed to detect an interaction, 
supporting the results on the array (Table 2, Additional 
file 2: Figure S3). This may imply that either this is a bind- 
ing event specific to PLCyl (and not PLCy2), or that the 
interaction reported at the level of the full-length 
protein may be more complex, perhaps requiring sec- 
ondary contact sites that are not available within the 
context of the short peptide used in the current study. 
In several other cases, literature-reported interactions 
that were array-negative turned out to be interactions 
with IC50 or K D values above 10 uM (Table 1). It is 
likely that a few low micromolar or even sub-micromolar 
binding events could be assigned as array-negative in 
our study due to synthesis yield heterogeneity and the 



fact that we are limited to arraying at one concentra- 
tion (0.25 uM in this study). We decided to design an 
empirical reporting scheme that was conservative, sac- 
rificing many true positives in order to limit false positives, 
which would have naturally arisen in the process of 
trying to minimize false negatives. We have made an 
effort to limit false negatives to those of lower affin- 
ity, and we are aware of no instance in our dataset 
of a sub-micromolar affinity interaction being scored 
as array-negative. 

Many high-affinity interactions, such as the interac- 
tions between the Src and Lck SH2 domains and 
pl30Cas pY-664, fell into our array-indeterminate set 
(lx-3x mean), likely due to the synthesis efficiency and 
accessibility of these particular peptides and the semi- 
quantitative nature of the system. Indeed, many of the 
peptide-SH2 interactions that fall in the indeterminate 
set are likely to be real binders. Some surprising differ- 
ences between SH2 domains can be reconciled this way. 
For instance, comparing between the Abll and Abl2 
SH2 domains there is a significant difference in array 
positive interactions between the two. This is surprising 
considering the sequence similarity between the two 
domains. Because of the heterogeneities inherent in this 
study design as indicated above and the similarities be- 
tween the two proteins, discrepancies of this sort likely 
represent false negatives. In total, the limited number of 
incongruities between the current data set and the litera- 
ture are thus largely reconcilable. 

A high-throughput binding study reported interactions 
between a large set of SH2 domains and phosphopep- 
tides within four receptor tyrosine kinases (including 
IGF-1R and FGFR1) overlaps with the present study 
[57]. Our dataset only validates 5 of 51 of these interac- 
tions and describes 6 additional interactions not 
reported in that study. This disagreement is in contrast 
to the high degree of consensus between the present 
study and a wide range of previous studies (Table 1). 
We examined a number of the interactions reported by 
Kaushansky A et al. using a combination of an ortholo- 
gous experimental approach, comparison to consensus 
binding motifs, and literature validation. As noted 
above, SH2 domains have well described binding motifs 
and adhere to these remarkably well in the current 
study. Kaushansky A et al. report a large number of 
interactions that do not approximate the binding motifs 
to which the corresponding SH2 domains are known to 
be capable of binding. In addition, SH2 domains make 
use of contextual sequence information and non- 
permissive residues that block binding in order to im- 
prove selectivity [22]. For example, the Grb2 family has 
a very strong preference for an asparagine residue at 
the +2 position and will not tolerate a proline residue 
at the +3 position [19-22]. Kaushansky A et al. report a 
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series of Grb2 interactions with peptides that do not 
contain the required permissive residues, and further- 
more many that contain strong non-permissive residues 
(Additional file 2: Figure S7 and Table S3). Similarly, 
Crk SH2 requires a +3 Leu or Pro yet this motif is ab- 
sent in many of the Crk SH2 binding peptides reported 
by Kaushansky et al. Indeed, the 46 interactions 
reported by Kaushansky et al. that we fail to confirm 
overwhelmingly contain peptides that lack conformity to 
the consensus motifs to which the cognate SH2 
domains are known to interact [19,20,29]. In addition, a 
number of apparent "hub" peptides reported in Kaush- 
anky et al. contain cysteine residues (eg. FGFR1 pY-583, 
FGFR1 pY-605, FGFR1 pY-730), and the interactions 
were probed in the absence of reducing agents [57,58]. 
In the present study, binding was assayed in the pres- 
ence of 1 mM DTT and peptides containing cysteine 
residues were substituted with serine [59]. Kaushansky 
et al. provide no corroboration of their results by either 
orthogonal assay or literature validation, while the 
present study provides extensive corroboration. 

Even in the cases where our data overlap, the 
reported apparent I<d values reported by Kaushansky 
et al. appear inconsistent with direct measurements 
conducted using well controlled solution binding 
measured by fluorescence polarization [57]. For ex- 
ample, Kaushanskyet al. report a K D of 175nM for 
the interaction between Rasal-N-SH2 and FGFR1 pY-463 
while we measured a Kd of 1.54 uM by fluorescence 
polarization (Additional file 2: Figure S7), Additionally, 
there are 6 interactions that we report that are not noted 
by Kaushansky et al. We picked one of these binary 
pairs at random, the interaction between Crk SH2 
and FGFR1 pY-463, and tested binding in solution. 
We measured K D of 380 nM for this interaction, val- 
idating this binding event. 

Taken as a whole, comparisons with the literature val- 
idate the results presented in this study. Non-array- 
positive literature-reported interactions tend to fall into 
three categories: 1) low affinity interactions; 2) near 
misses that are array-indeterminate and thus just below 
threshold; or 3) cases where orthogonal measurement 
confirms no interaction at the level of the individual 
SH2 domain and 11-mer phosphopeptide. Comparison 
with an SH2 domain array study reveal limitations in 
that technique and suggest that SH2 domain arrays on 
glass substrates may suffer from a high rate of false posi- 
tive and false negative interactions. This is consistent 
with results from the same group investigating PDZ do- 
main binding using a similar protein microarray method 
which concluded that the technique resulted in a false 
positive rate of approximately 50%, and poor corres- 
pondence between array-estimated and solution-binding 
measured equilibrium-dissociation values [60-62]. 



Metadata-rich interaction maps 

Probing arrays with 50 SH2 domains identifies a total of 
529 array-positive interactions, together with 5949 array- 
negative and 1122 indeterminate SH2-ligand pairs. 
Array-positive interactions between SH2 domains and 
pTyr sites map the potential SH2 interactome. The 
connections between SH2 domains and InsR, IGF-1R, 
IRS-1, IRS-2, FGFR1, FGFR2, FGFR3, FGFR4, FRS2 and 
FRS3 together with pl30Cas, PLCyl and p62DOKl 
highlight a wide range of putative SH2 interactions 
within the immediate FGF and Ins/IGF- 1 signaling net- 
works (Figure 3). The prediction of novel interactions 
comes with the inherent caveat that a given SH2 protein 
would need to be co-expressed with its interaction part- 
ner. For example, Grap and Gads are expressed only in 
certain hematopoietic cells [63,64]. Interactions recorded 
for the SH2 domains of Gads and Grap are not useful 
for predicting interactions in other cell types but may be 
considered as supporting data for the interactions of the 
closely related Grb2 SH2 domain. The similar specificity 
of the SH2 domains of Grb2, Gads and Grap results in 
an overlapping set of target peptides where the inde- 
pendent binding of all three SH2 domains increases our 
confidence that this peptide is in fact a high-quality lig- 
and for this class of SH2 domains. 

To enhance the interaction maps derived the current 
study, we incorporated multiple layers of additional data 
gleaned from a variety of sources. Specific phosphopep- 
tides reported in the PhosphoSite database are noted for 
each of the 13 target proteins in Figure 3 (Additional file 1: 
Table SI) [65]. Reported phosphorylation remains a 
moving target, particularly as certain sites may be phos- 
phorylated only in certain tissues or transiently upon 
recruitment of specific kinases [33]. In cases where 
phosphorylation of a tyrosine residue has been reported, 
we assume that region to be solvent accessible and capable 
of interactions. If phosphorylation has not been reported 
solvent accessibility may be considered as a minimal 
threshold for phosphorylation and SH2 domain binding. 
This is with the caveat that certain residues, such as the 
activation loop tyrosine in the kinase domain of the InsR 
and IGF-1R are buried in the inactive state but become 
phosphorylated and solvent exposed in the activated state. 
The phosphorylated and exposed activation loop is then 
able to bind to SH2 domains [66]. Given the dynamic 
nature of protein structures and the ability of buried 
residues to become exposed upon structural rearrange- 
ment, one cannot presuppose that buried residues never 
become exposed. Nonetheless, solvent accessibility pro- 
vides an additional level of support for potential phospho- 
dependent interactions in cases where phosphorylation 
has not been reported. Existing structures provide a 
greater level of confidence in such interactions while 
at the same time identifying potential anomalous 
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Figure 3 (See legend on next page.) 
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(See figure on previous page.) 

Figure 3 High-resolution interaction maps detail an SH2 domains potential interactome. A phosphotyrosine interactome for 13 proteins 
involved in FGF-family and Insulin-family signaling and 50 SH2 domain partners. Phosphotyrosine peptides are indicated by their position within 
their host protein and color-coded as either PhosphoSite reported phosphorylation sites (yellow); sites not reported as phosphoryiated (red); sites 
not reported to be phosphoryiated but where a closely related site on a paralogous protein is known to be phosphoryiated (red/yellow); or the 
peptide was discarded as non-specific (black). Interactions between the vertices of SH2 domains and phosphopeptides identified in this study are 
indicated as edges (lines) and color-coded according to the level of support provided by previous studies: if the precise phosphorylation site has 
been reported to interact with the noted SH2 domain the edge is denoted in red. A black line is representative of proteins that are reported to 
interact defined by interaction databases including HPRD, BIND, MINT and DIP, but the site of interaction is unknown. SH2 interactions not 
confirmed by literature but whose binding is greater than 3X mean on the array are represented with grey lines. 



interactions with buried peptides. The Gerstein Accessible 
Surface algorithm was employed to calculate the access- 
ible molecular surface [67,68] of each tyrosine residue 
within structure files PDBID:1IRK, 2DTG, 1P40, 1K3A, 
1IRS, 1QQG, 2FGI, 2PVF, 2PSQ, 1XRO, 2YS5, 2YT2, 
2 V76, 1WYX, 1HSQ, and 2HSP that represent regions 
of InsR, IGF-1R, IRS-1, FGFR1, FGFR2, FRS2, FRS3, 
p62DOK, pl30Cas and PLCg in various conformations 
(Additional file 2: Table S4). Sites that fell below the 
threshold of the minimally accessible phosphorylation site 
(excluding the activation loop tyrosine) are marked in 
orange text for the residue number in Figure 3. Many 
of these sites are also excluded as non-specific inter- 
action sites, likely reflecting their hydrophobic nature. 
Inclusion of structural data, where available, makes 
use of a significant resource to interpret potential pTyr 
interaction data. 

Previously reported specific SH2-phosphopeptide inter- 
actions confirmed in this study (Table 1) are highlighted 
as red lines (Figure 3) and represent the highest confi- 
dence interactions. Noted as black lines are cases for 
which protein-protein interactions have been reported in 
MINT [50], BIND [51], HPRD [52], and DIP [53], without 
reference to specific binding sites or direct involvement of 
an SH2 domain. Interactions noted in the current study 
that are not listed in any of the major interaction data- 
bases, are represented as grey lines. 

Position weighted matrices define physiological ligand 
specificity 

To represent the specificity of SH2 domains in this study 
we define position weighted matrices (PWMs) based on 
the array-positive peptides. PWMs such as the position- 
specific scoring matrix (PSSM) [21] are a well-established 
method to describe biding motifs. In a PWM, each matrix 
column describes the probability that a given amino acid 
will be found at that ligand position. The PWM may also 
be visualized as a sequence logo [69] (Figure 4A). The 192 
physiological peptides represented on the arrays in this 
study do not conform to a random distribution of 
residues at each position. To compensate for this the 
matrices were corrected for the prevalence of amino 



acids residues at each position in the total data set. In 
addition, the absence of binding to a given peptide may 
provide data on inhibitory effects of specific residues. 
For instance, lack of binding may result from either the 
absence of critical permissive residues or from the pres- 
ence of inhibitory residues at specific positions [22]. To 
make use of both array-positive and array-negative data 
we corrected for frequency of occurrence of a given 
residue at each position using the array-positive pep- 
tides (posPWM). This is compared to a PWM of the 
expected frequency of all peptides, excluding non- 
specific peptides (exPWM). The scoring matrix that 
results from subtracting exPSSM from posPSSM 
expresses the deviation observed in the array-positive 
data from that of all specific peptides on the array. We 
term this the expectation-deviation scoring matrix 
(EDSM). 

[EDSM] = [posPWM]-[exPWM] 

By expressing differences between peptides that bind 
specifically and the peptide set as a whole, the EDSM 
attempts to compensate for any inherent bias arising in 
the relatively small set of non-random peptides drawn 
from physiological proteins. The EDSM for each SH2 in 
this study is visualized using sequence logos (Additional 
file 2: Figure S5) and condensed into a generalized state- 
ment of physiological specificity in the form of a regular 
expression (Table 3). A distance matrix comparing the 
EDSMs for the physiological specificity of the SH2 
domains describe families of SH2 domains related by 
their preference for physiological ligands (Additional file 
2: Figure S6). This is represented as an unrooted tree of 
SH2 domain specificity (Figure 4B). Six classes of general 
specificities are displayed among the SH2 domains tested 
in this study revealing similarity among SH2 domains 
within the same family (eg Grb2, Gads, Grap) and across 
different families (Sh2dlb, Ship2) but also subtle differ- 
ences (eg Abll and Abl2). Although the EDSM is 
informed by both permissive and non-permissive effects, 
the limited dataset afforded by the addressable arrays in 
this study limits the utility of the resulting matrices for 
extrapolating information on non-permissive residues. 
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Figure 4 Specificity for physiological peptides defines functional groups of SH2 domains. (A) Grb2 SH2 domain positive peptides are 
highlighted and then represented as an EDSM logo. See Figure S5 for EDSM logos of all tested SH2 domains. (B) An unrooted dendrogram 
clusters families of SH2 domains related by similar binding patterns. A distance matrix between EDSMs was computed and used to generate an 
unrooted distance tree (see Figure S6). This is artistically represented as a dendrogram with general specificity information overlaid and functional 
classes denoted by branch color. 



Discussion 

The analysis of SH2-mediated interactions with peptide 
ligands representing the receptors and substrate proteins 
of the insulin, IGF-1 and FGF systems described herein, 
reconstructs the set of potential phosphotyrosine- 
mediated interactions that determine the capacity of 
these systems to recruit signaling proteins upon activa- 
tion. The potential interactome outlines the possible 



signaling states that may participate in signaling. Among 
the factors that determine the possible signaling net- 
works initiated by activated receptors are 1) the available 
set of SH2 proteins expressed in specific cells; and 2) the 
capacity of phosphorylated receptor and scaffold sites to 
recruit those SH2 proteins. The 111 SH2 domain pro- 
teins extant in the human genome vary extensively in 
their tissue and cell specific expression [15,36]. In some 
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Table 3 Specificities obtained using Physiological Ligands 



SH2 Domain 


Specificity 


ABL1 


[pY] [D/E/S] [D/E] [P/N/D/E] 


ABL2 


[pY] [V] [N/Q] 


BLK 


[pY] [D/EAp] [D/E/L] [P/l] 


BRK 


[pY] [D/E] [X] [D/E/cp] 


CRK 


[D] [X] [pY] [D] [V/L] [P] [P] 


CRKL 


[D] [X] [pY] [D] [cp] [P] [P/R] 


DAPP1 


[pY] [X] [X] [D/E/cp] [E] 


FER 


[D/E] [D/E] [pY] [D/G] [D/E] [cp] 


FGR 


[E] [P/D/E] [X] [pY] [D/E/G] [X] [D/E/cp] [Y] 


FYN 


[pY] [X] [D/cp] [cp] 


GADS 


[pY] [V] [N] 


GRAP 


[pY] [V/E] [N] 


GRB2 


[pY] [V/E] [N] 


GRB7 


[pY] [E] [N/Y] 


HCK 


[D/E] [D/P] [X] [pY] [D/E/G] [D/E/cp] [P/l/L] 


ITK 


[pY] 


[cp] [X] [D/cp] 


LCK 


[pY] 


[D/E/G] [D/E] [P/L] [P] 


LYN 


[pY] 


[D/E/G] [D/E] [P] [P] 


MIST 


[pY] 


[cp] [Q [cp] [D/E] [cp] 


NCK1 


[pY] 


[D/E] [E/L] [P/V] 


PI3K1_N 


[pY] 


[V/D] [X] [l/M/V] 


PI3K1_C 


[pY] 


[V/M/E] [N/T/M] [M] 


PLCG1_C 


[pY] 


[cp] [X] [D/E] 


PTPN11_N 


[pY] 


[cp] [X] [cp] [D/E] [cp] 


RASA1_N 


[pY] 


[cp] [X] [D/cp] 


RASA1„C 


[pY] 


[X] [X] [D/E/cp] 


SH2B 


[pY] 


[X] [X] [D/E/cp] 


SH2D1B 


[pY] 


[X] [X] [cp] 


SH2D2A 


[pY] 


[E] [N/TJ [D/cp] 


SH3BP2 


[pY] 


[D/E] [N] [V] 


SHB 


[pY] 


[cp] [X] [cp] [D/E] [cp] 


SHD 


[pY] 


[cp] [X] [cp] [D/E] [cp] 


SHE 


[pYJ 


[cp] [X] [cp] [D/E] [cp] 


SHF 


[pYJ 


[cp] [X] [cp] [D/E] [cp] 


SHC1 


[pYJ 


[D/E/G] [D/E/cp] [cp] 


SLNK 


[pY] 


[G/D/V] [DAT] [D/cp] 


SRC 


[D/E] [X] [X] [pY] [D] [D/E/cp] [P/l] 


SYK_C 


[cp] [pY] [V] [X] [D/E/cp] [D/E] 


TENC1 


[pY] [E] 


VAV1 


[pY] LV/FJ/L] [X] [P] 


YES 


[pY] [D/E/G] [D/E/cp] [cp] 



Table 3 Specificities obtained using Physiological Ligands 

(Continued) 

ZAP70_N [P] [X] [pY] [X] [X] [cp/cp] 

The general specificity information is obtained from the arrays is expressed in 
a regular expression form. Amino acid residues are indicated by their single- 
letter codes. Groups of amino acids are noted as cp = hydrophobic residues 
(Val, lie, Leu, Phe, Trp, Tyr, Met); ip = aliphatic residues (Val, He, Leu, Met); 
£ = polar residues (Asn, Gin, Ser, Thr, Glu, Asp, Lys, Arg, His). 

cases these expression differences are drastic and even 
define highly tissue-specific signaling networks such as 
those in B- and T-lymphocytes [14,15]. Among the 38 
SH2 families, 33 possess at least one gene duplicate 
allowing a duplicate copy to acquire new functions such 
as specialized tissue functions or novel scaffolding cap- 
abilities [70]. The expression of a family member in one 
tissue may perform a redundant function to its paralog in 
another tissue but may also diverge in terms of functions 
(Additional file 2: Table S5). The potential interactome for 
SH2 domains indicates many cases of potential overlap in 
binding, resulting in pTyr sites that may act as hubs for 
multiple interactions or serve distinct binding functions in 
cases where the SH2 complement varies in different cells. 
The varied potential interaction permutations, or micro- 
states, in turn, are the basis for highly cell-specific signal- 
ing outcomes from discrete signal inputs [34]. In simple 
terms, differences in the available phosphorylated tyrosine 
sites as well as in the expression of SH2 domain proteins 
themselves has the potential to furnish related but distinct 
signaling events in responses to the same input signal 
(Figure 5A). Currently the phosphorylation dataset avail- 
able from PhosphoSite and PhosphoELM provide only a 
static view of receptor and scaffold phosphorylation. Even 
within a cell, the available complement of pTyr sites and 
locally available SH2 domain proteins may vary over the 
lifetime of a signal. Protein interaction microstates may 
differ according to the intensity of ligand stimulation and 
change as signaling complexes move within the cell, for 
instance as receptors are internalized on signaling endo- 
somes (Figure 5B). For example, GrblO and Grbl4 are 
closely regulated adaptor proteins that share similar func- 
tions by binding to InsR and negatively regulating insulin 
signaling. While both genes share high expression in the 
pancreas, expression varies among adipose, liver and the 
heart (Figure 5C). However, little is known about the tem- 
poral and spatial dynamics between these two adaptors. 
Recently studies utilizing multiple reaction monitoring 
(MRM) mass-spectrometry has been applied to the Grb2 
adaptors to map the dynamic interaction states upon vari- 
ous growth factor stimulation [71]. Analyses of this type 
will allow us to better dissect the vast number of micro- 
states among different tissues. Thus, potential interac- 
tomes represent crucial datasets to interpret cell and 
tissue specific signaling events. This is particularly relevant 
in human development and diseases such as cancer in 
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Figure 5 (See legend on next page.) 
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(See figure on previous page.) 

Figure 5 Tissue co-expression and microstate of the lnsulin/IGF-1 system. Protein interaction microstates across different cell types and 
across time and space. (A) Co-expression between receptors and SH2 domains can influence the microstate of a specific tissue. 

(B) Phosphorylation of receptors under stimulation conditions can determine the temporal and spatial events of SH2 ligand binding within a eel 

(C) Hierarchical clustering of the insulin responsive tissue expression levels for human SH2 domain-containing genes. 



which receptor tyrosine kinases are commonly over- 
expressed, sometimes by several orders of magnitude. In 
such pathologies, the primary signaling pathways may be 
titrated out and novel, normally non-physiological path- 
ways may become activated. For instance, IGF-1R is either 
overexpressed or hyperphosphorylated and deregulated in 
a range of cancers and is currently one of the most studied 
molecular targets in the field of oncology yet direct target- 
ing of IGF-1R has proven problematic due to it's wide 
range of important physiological functions [72-74]. Under 
conditions of hyperphysiological abundance of IGF-1R 
pTyr sites available for SH2 binding, the potential interac- 
tome suggests the potential for non-canonical pathways to 
become activated, perhaps hinting at novel targets for 
therapeutic intervention. 

Even in normal physiological circumstances of healthy 
tissues, the potential interactome may inform our under- 
standing of tissue-specific signaling events. A variety of 
tissues can respond to insulin stimulation, including 
adipose, muscle, pancreas, liver, brain etc. [75,76]. SH2 
domain-containing proteins vary widely in their expres- 
sion in various cells and tissues (Figure 5C). While this 
likely represents only a piece of a much larger puzzle, it 
is conceivable that some of the observed tissue-specific 
responses and downstream signaling differences may re- 
late to the available complement of SH2-containing sig- 
naling proteins and their ability to interact with available 
pTyr sites. In this way, the potential interactome and 
cell-specific expression combine to determine effective 
signaling networks. 

Consensus motifs and co-evolution 

The interaction data also reveals the specificity of 50 
SH2 domains for a set of physiological peptides. Typical 
binding motifs for SH2 domains describe the residues at 
positions +1 to +4 C- terminal of the essential phospho- 
tyrosine [77-79]. SH2 domain peptide binding motifs 
have been described for a wide range of SH2 domains 
using peptide library approaches [19,20,29]. Binding 
motifs obtained from peptide library approaches repre- 
sent optimal solutions unconstained by physiological 
parameters such as the confounding effects of kinases 
recognition or structural influences of native proteins. 
The motifs described herein represent binding to 'real- 
world' peptides and thus stand as a relevant contrast 
to peptide-library based data. However it should be 
noted that this dataset corresponds to a potential 
physiological interactome. Because all of the peptides 



haven't been confirmed to be phosphyorylated in vivo, 
our interaction maps are best used in conjunction 
with the expanding mass spectrometry literature and 
their associated databases. 

Broadly speaking, the SH2 consensus binding motifs 
identified from interactions observed using addressable 
arrays of physiological peptides are remarkably similar to 
the motifs described using peptide library approaches 
(Table 3). Yet binding specificities observed for physio- 
logical phosphotyrosine peptide ligands may in some 
cases represent more than the specificity of the isolated 
SH2 domain. The EDSM position weighted matrices 
noted in Additional file 2: Figure S5 reveal a number of 
cases in which the residues outside of the conventional 
window of residues at positions +1 to +4 appear to influ- 
ence binding. Longer contact regions have been noted 
for certain SH2 domains in the past, though these are 
generally exceptions to the rule. For instance, the SH2 
domain of SH2D1A/SAP binds to an extended peptide 
in the SLAM receptor comprised of residues -2 to +3 
and shows a diminished dependence on phosphorylation 
of the tyrosine for binding [43]. Physiological peptide 
ligands co-evolve to allow recognition by their cognate 
SH2 domain partner, while also acting as competent 
substrates for their cognate kinases. In some cases, the 
observed specificity for physiological peptide ligands 
may therefore represent an amalgam of SH2 specificity, 
kinase recognition, and other factors. This may, for ex- 
ample, explain the apparent observed preference of the 
Crk SH2 domain for an Asp residue at the -2 position. 
The presence of an aspartic acid residue at the -2 pos- 
ition does not appear to contribute to Crk SH2 domain 
binding (Figure 4B), however, this may instead reveal a 
signature for a distinct event such as kinase recognition 
for a specific subset of physiological peptides. Indeed, a 
large number of tyrosine kinases have reported prefer- 
ence for acidic residues preceding the target tyrosine 
residue [80,81]. Not surprisingly, acidic residues are 
commonly observed in the EDSM logos for the SH2 
domains (Additional file 2: Figure S5). In addition to act- 
ing as kinase substrates and SH2 domain binding sites, 
the peptide motif must also presumably be surface 
exposed, and potentially disordered prior to binding, and 
these factors may also contribute to the overall physio- 
logical peptide motif. Combining multiple motifs in 
computational searches has been shown to markedly in- 
crease predictive accuracy [82], suggesting that the in- 
clusion of indirect components such as kinase specificity 
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may make for a more robust predictor of SH2 interac- 
tions. While the current data set is relatively small in 
size, larger sets of data identifying physiological peptide 
interactions may provide useful data for investigating the 
overlapping influences of multiple events required for 
functional signaling based on overlapping motifs. 

In our analysis we find that peptides reported to be 
phosphorylated in PhosphoSite are significantly more 
likely to have one or more SH2 domain-binding partners 
than peptide nodes that are not currently known to be 
phosphorylated. This is not surprising given that evolu- 
tionary pressure may be exerted to conserve critical 
binding sites. Conversely, given the specificity of SH2 
domains, the chances of an SH2-interacting peptide oc- 
curring by chance within a non-phosphorylated peptide 
may be assumed to be relatively low. The more residues 
that must be specified to stipulate binding, the lower the 
probability is that this will occur spontaneously within 
a non-phosphorylated sequence. If only one key resi- 
due supported by one of two secondary residues was 
capable of allowing an SH2 domain to bind, then the 
chances of randomly generating an SH2 binding site 
centered around a given tyrosine residue are less than 
one in a hundred. Given the specificity observed for 
SH2 domains in this study, the likelihood of a ran- 
dom sequence encoding an SH2 domain ligand appears 
rather limited. The appearance of a small number of 
highly connected peptide nodes on sites not currently 
known to be phosphorylated raises the question of whether 
SH2 domain-binding might serve as means of predicting 
phosphorylation. Perhaps highly connected peptide hubs 
such as IRS1 Y-151, IRS2 Y-184, FRS3 Y-287 and FRS3 
Y-322 predict phosphorylation. ScanSite predicts the 
first three of these sites as kinase substrates, while the se- 
quence surrounding FRS3 Y-322 is identical to a known 
phosphorylation site on FRS2, suggesting that these 
may indeed turn out to be phosphorylated under ap- 
propriate conditions. 

A high degree of selectivity for physiological ligands 
may itself be an outcome of evolutionary pressures, as 
has been noted for yeast SH3 domains. The Shol SH3 
domain recognizes a binding peptide in Pbsl, and no 
other SH3 domain in the yeast genome cross-reacts with 
the Pbsl peptide. SH3 domains from other species that 
have not been under evolutionary pressure to ignore this 
site exhibit less selectivity for the Pbsl peptide [83]. A 
high degree of specificity among human SH2 domains, 
combined with cell-specific expression is consistent with 
the notion that evolutionary pressures drive selectivity of 
protein-ligand interactions. 

Comparison to the literature 

In the quarter century since the SH2 domain was first 
described [48,84], hundreds of interactions have been 



described between SH2 domains and phosphotyrosine 
peptides. In many cases these have been subject to in- 
tensive biophysical analysis yielding a considerable set of 
bonafide interactions against which HTP studies can be 
validated. Placing new studies within the context of the 
extant literature is particularly important for systems 
levels studies for which validation is inherently limited. 
In the case of the 50 SH2 domains and 192 peptides 
included in this study, we confirmed 60 interactions by 
the orthologous method of fluorescence polarization. 
We compared our results to those reported in previous 
studies. In the case of carefully controlled studies that 
examine SH2 interactions, our results closely match the 
reported interactions (Table 1). However, our results did 
not match well against one large-scale interaction study 
conducted using SH2 domain arrays (Additional file 2: 
Table S3) [57]. Our results suggest that the SH2 protein 
micro-array results may suffer from high false-positive 
and false-negative rates and that the reported K D values 
are likely inaccurate. This is consistent with other stud- 
ies suggesting that protein microarray data is semi- 
quantitative and subject to false-positive results [60], 
particularly in the absence of orthologous validation 

Several lessons may be taken from such results and 
suggest a set of standards that could be universally 
applied in future high throughput studies of protein- 
peptide interactions and these are explored in detail 
elsewhere [54]. First, proteins are fundamentally prob- 
lematic in that they may easily lose binding activity. 
A set of positive controls is thus essential and should 
be present in every assay. Only about half of the SH2 
domains express well as fusion proteins from bacteria 
[23]. The rest suffer from poor expression and lack 
reproducible binding activity, suggesting that any use 
of these SH2 domains in high-throughput in vitro 
binding studies may yield erroneous results. The 
present study used only 50 SH2 domains that have 
previously been shown to express well and exhibit 
good solubility and reproducible binding. A second 
issue relates to validation by orthologous method, to 
which the current study examines 60 binary pairs by 
the orthogonal method of solution phase fluorescence 
polarization binding, as well as a smaller set by GST- 
pulldown. A third consideration is agreement between 
HTP datasets and existing literature. Well-controlled 
studies reporting peptide-binding motifs for SH2 
domains provide a wealth of data. SH2 domains bind 
to relatively specific motifs [19,29], and these provide 
excellent validation tools. Apparent interactions that 
do not match the known binding motifs are a cause 
for concern and should be further validated. As noted 
in Table 1, the dataset described in this study is in 
strong agreement with literature-reported interactions, 
and the variations can largely be rationalized. 
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Concluding remarks 

In examining SH2 domain interactions, we followed a 
systematic approach for systems-level interactome stud- 
ies using orthologous validation and literature curation 
as a means of enhancing confidence in the experimental 
dataset. This results in a large set of high-confidence 
interactions that outline the potential interactome between 
50 SH2 domains and 192 phosphopeptide sequences 
covering 13 proteins involved in FGF, Insulin, and 
IGF-1 signaling. The development of a detailed poten- 
tial interactome for this set of signaling components 
represents an early step towards a more detailed under- 
standing of cell-specific signaling networks. This stands 
to deepen our understanding of tissue-specific and disease- 
specific signaling networks that are predicated upon 
the varying and inevitably complex interpretation of 
the potential interactome by the available expressed 
interaction partners. 

Experimental procedures 
Plasmids and recombinant proteins 

A comprehensive list of 121 SH2 domains contained in 
111 human proteins [14] served as the starting point for 
the assembly of a large set of SH2 domain clones. The 
cDNA clones for SH2 domains were obtained from 
ATCC except for those noted otherwise. A complete list 
of source DNA and SH2 clones is shown in Additional 
file 3: Table S2. SH2 domains were cloned into pGEX- 
2TK (Amersham Pharmacia) and verified by DNA se- 
quencing. GST-fusions of SH2 domains were expressed 
in E. coli strain BL21 (Stratagene) at 37°C overnight and 
induced with 1 mM IPTG for 3 hours. Cells were centri- 
fuged, resuspended in PBS and lysed by sonication. The 
cellular fractions were incubated with glutathione seph- 
arose (Thermo Scientific) and washed with PLC lysis 
buffer (50 mM Hepes pH 7.5, 150 mM NaCl, 10% gly- 
cerol, 1% Triton X-100). SH2 proteins were eluted using 
10 mM glutathione, 50 mM Tris HC1 pH 8.0 and puri- 
fied using the NAP-10 (Amersham Pharmacia) column 
system. 

Peptide arrays 

The peptide libraries were synthesized onto an acid 
hardened amino-PEG500 cellulose membrane #UC540 
(Intavis, Germany) using an Intavis Multipep as described 
[41]. The estimated yield of peptide at each position was 
approximately 5 nmols. Addressable peptide arrays repre- 
senting physiological peptides were composed of 192 
peptides, each composed of 11 amino acid residues, 
corresponding to tyrosine-containing peptides from 
InsR, IGF-1R, IRS-1, IRS-2, FGFR1, FGFR2, FGFR3, 
FRS-2, FRS-3, PLCyl, pl30Cas, p62DOKl. Phospho- 
tyrosine residues were located at the fifth position in 



singly phosphorylated peptides. In most cases Cys residues 
were replaced with Ser. The membranes were stored 
at -20 until use. The membranes were deprotected 
according to manufacturer instructions, using a 95% 
TFA, 3% TIPS, 2% H 2 0 cocktail for three hours. 
Phosphotyrosine incorporation was assessed by incubation 
with anti-phosphotyrosine antisera 4 G10 (Upstate) and 
pY20 (Santa Cruz). Additional file 1: Table SI indi- 
cates the array position, peptide sequence, protein source 
position, and comments on related peptides and syn- 
thesis problems. 

SPOTs Analysis of SH2 domain specificities 

All steps were carried out at room temperature unless 
otherwise specified. The SPOTs membrane was first 
blocked with 5% nonfat milk in TBS-T (0.1 M TrisHCl 
(pH 7.4), 150 mM NaCl, and 0.1% Tween 20) overnight 
at 4°C. GST alone or GST fusion proteins (0.25 uM) 
were incubated with the SPOTs membrane in the same 
buffer containing 1 mM DTT for IV2 hours at room 
temperature and then washed with TBS-T. Anti-GST 
(Amersham) antibodies were used to detect GST fusion 
proteins and then incubated with anti-Goat Alexa-Fluor- 
680 (Molecular Probes). The array membrane was sub- 
sequently washed four times with TBS-T for 10 min. 
Peptides that bound the domain of interest were visua- 
lized by Li-Cor Odyssey using the 700 nm channel. In- 
tensities were calculated using a grid with 192 circular 
features of 2 mm diameter, each centered around a pep- 
tide spot to avoid scoring SPOTs with halo or rings. For 
each feature, the average (integrated) intensity was used 
for downstream analysis. 

Fluorescence polarization 

Peptides were synthesized using FMOC-chemistry onto 
pre-loaded tenta-gel resins. Peptides were then labeled 
with Rhodamine B (Abbey Color) and then cleaved using 
trifluoroacetic acid. Peptides were lyophilized and then 
purified using a LC/MS (Agilent 2100). Dissociation 
constants were measured using the Beacon 2000 (Invi- 
trogen) as previously described [40] . 

Data analysis 

All analysis steps were performed as previously described 
[86]. Peptide intensity scores (excluding those defined as 
non-specific) were averaged across each 192-peptide array, 
producing an array mean. Array-positive binding was 
ascribed to interactions with intensities greater than three 
times the array mean. Peptide spots with average intensity 
values between 1X-3X the array mean were defined as 
'indeterminate'. Those with intensities below IX mean 
were defined as array negative. Non-specific signal was 
detected by arraying three separate 192 arrays with 
three separate GST preps at 0.25 uM. Non-specific 
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binding peptides were identified as those with signal in- 
tensities greater than 3X the array mean in at least two of 
three trials. 

Phosphorylation status and solvent exposed tyrosines 

The structures files of InsR (1IRK, 2DTG), IGF-1R (1P40), 
IRS-1 (1IRS, 1QQG), FGFR1 (1FGK), FGFR2 (2PVF), 
FRS2 (1XR0), p62DOKl (2 V76), PLCG1 (1HSQ, 2HSP) 
collected from Protein Data Bank (PBD) (www.rcsb.org). 
Surface accessible tyrosines were solved using the Gerstein 
algorithm (http://helbcweb.nih.gov/structbio/). The phos- 
phorylation status of the 192 sites was identified using the 
protein modification resource, Phosphosite (http://www. 
phosphosite.org). 

PSSMs and EDSM 

For each SH2 domain a position specific scoring matrix 
(PSSM) was calculated for the array-positive peptides 
(posPSSM). A second PSSM was calculated for all 
peptides, excluding those judged to be non-specific, 
as the expected distribution of amino acids repre- 
sented on the array (exPSSM). Subtracting exPSSM 
from posPSSM yields the expectation deviation scor- 
ing matrix or EDSM. The EDSM for each SH2 do- 
main was visualized as a logo of positive and negative 
factors using Weblogo [69]. 

EDSM clustering 

The unbiased position specific expectation deviation 
scoring matrix was expanded into a hyper-dimensional 
vector representation, and the Euclidean distances 
between vectors was computed. The resulting N-by-N 
distance matrix was then clustered using the Fitch- 
Margoliash method in the Phylip package [85]. The 
unrooted tree was drawn using the MEGA package [86]. 

Reported interactions 

Reported peptide interactions were collected by search- 
ing HPRD and literature. Reported protein interactions 
were collected from the major protein-protein inter- 
action databases of MINT [50], BIND [51], HPRD [52], 
and DIP [53] using UniHI [49]. 

Cells lines and GST-pull downs 

Chinese Hamster Ovary (CHO) cells stably overexpressing 
insulin receptor (InsR) and IRS-1 were graciously pro- 
vided by Xiao Jian Sun (UChicago). CHO cells were grown 
in DMEM/F12 supplemented with 10% fetal bovine 
serum, penicillin and streptomycin. CHO cells were serum 
starved for 24 hours and treated with and without insulin 
(100 nM) for 5 mins. Cells were lysed in HNTG (20 mM 
Hepes 7.5, NaCl, 1% Triton X-100, 10% Glycerol, 1 mM 
NaV0 4 ) with protease inhibitors (1 mM PMSF, aprotonin 
and leupeptin). Pre-cleared lysates were incubated with 



GST-SH2 domains immobilized on glutathione beads and 
rocked for 3 hours at 4°C. Activated InsR and IRS-1 were 
detected using anti-phosphotyrosine 4 G10 (Upstate). 

Additional files 



Additional file 1: This table includes the position of the peptide on 
the array, peptide sequence, protein name, site of tyrosine 
phosphorylation and the status of phosphorylation based on 
Phosphosite (www.phosphosite.org). Information regarding whether 
the peptide spot is considered "non-specific" is indicated. SH2 domains 
that bound >3x and between 2x and 3x the mean are also listed. 

Additional file 2: Supplementary materials includes a detailed 
description of GST background removal, supplemental tables and 
supplemental figures. 

Additional file 3: Complete list of SH2 domains tested onto the 
SPOT arrays. 
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